V2: Primary Aims Analyses in a Prototypical SMART

Workflow and Code for Three Types of Primary Aims

Authors

Jamie Yap

Mason Ferlic

John J. Dziak

Daniel Almirall

1 The Motivating Study

Our motivating study is the ADHD SMART (PI: Pelham), an example of a prototypical SMART. A picture of this schematic is also available in the handout V2_handout_motivating_study.pdf.

2 Set up

# This is a simulated dataset which mimics the 
# most salient characteristics of the original
# dataset
dat_adhd <- read.csv("data/adhd-simulated-2023.csv")

2.1 Examine simulated data

Baseline covariates:

  • ID subject identifier

  • odd Oppositional Defiant Disorder diagnosis, reflecting whether the child was (coded as 1) or was not (coded as 0) diagnosed with ODD before the first-stage intervention.

  • severity ADHD score, reflecting ADHD symptoms at the end of the previous school year (larger values reflect greater symptoms). Range 0-10.

  • priormed medication prior to first-stage intervention, reflecting whether the child did (coded as 1) or did not (coded as 0) receive medication during the previous school year.

  • race white (coded 1) versus non-white (coded 0).

Intermediate covariates:

  • R response status. R = 0 if child was classified as non-responder to first stage intervention, R = 1 if they were classified as a responder.

  • NRtime months at which child was classified as non-responder. Range 2-8. Undefined for responders.

  • adherence adherence to the stage 1 intervention. Reflecting whether the child did (coded as 1) or did not (coded as 0) show high adherence to initial treatment.

Treatments:

We use contrast coding (effect coding) to denote the two levels of treatment assignment. The primary benefit of effect coding is that we get interpretable estimates of both the main effects and interactions.

  • A1 stage 1 treatment assignment. Randomized with probability \(0.5\) to Medication (MED, \(A1=-1\)) or Behavioral Intervention (BMOD, \(A1=1\)).

  • A2 stage 2 treatment assignment for non-responders. Non-responders we randomized with probability \(0.5\) to receive Augmented (AUG, \(A2=-1\)) or Intensified (INT, \(A2=1\)) care. A2 is coded as NA for responders.

Outcomes

  • Y0 baseline school performance (higher values reflect better performance).

  • Y1 mid-year school performance.

  • Y2 end-of-year school performance (this variable is the primary outcome for each primary aim)

head(dat_adhd)
ID odd severity priormed race Y0 A1 R NRtime adherence Y1 A2 Y2 cell
1 1 2.88 0 1 2.32 -1 0 4 0 2.79 1 0.598 C
2 0 4.13 0 0 2.07 1 1 NA 1 2.20 NA 4.267 D
3 1 5.57 0 1 1.00 -1 1 NA 0 2.29 NA 1.454 A
4 0 4.93 0 1 3.23 1 0 4 0 3.05 -1 6.762 E
5 1 5.50 0 1 1.48 1 0 6 0 1.73 -1 3.580 E
6 0 5.50 0 1 1.72 1 0 3 0 2.40 1 2.075 F

3 Analysis code for Primary Aim Type 1

3.1 Regression model

\[ E\left[Y_2(A_1) | \mathbf{X}\right] = \beta_0 + \beta_1 A_{1} + \beta_2 X_{1c} + \beta_3 X_{2c} + \beta_4 X_{3c} + \beta_5 X_{4c} \]

3.2 Step-by-step workflow and R code syntax

# Create a copy of the original dataset
dat_smart <- dat_adhd

Colored shapes indicate the primary comparison of interest

Step 1. Contrast code stage-1 and stage-2 treatment indicators (done for you) and center each baseline covariate by its own grand mean

dat_smart <- dat_smart %>%
  # Center each baseline covariate by its own grand mean
  mutate(odd_c = odd - mean(odd),                 # this creates X_1c
         severity_c = severity - mean(severity),  # this creates X_2c
         priormed_c = priormed - mean(priormed),  # this creates X_3c
         race_c = race - mean(race))              # this creates X_4c

Step 2. Estimate parameters in regression model

We used Generalized Estimating Equations (GEE) to estimate the model for the mean under each first-stage intervention option; obtaining the robust standard error was simple: we only needed to ensure that std.err = "san.se" in the geeglm function.

# geeglm is a function in the geepack package
# analogous to PROC GENMOD in SAS
model <- geeglm(Y2 ~ A1 + odd_c + severity_c + priormed_c + race_c, 
                id = ID, 
                data = dat_smart, 
                # ask the computer to give you robust standard errors
                std.err = "san.se") 
summary(model)

Call:
geeglm(formula = Y2 ~ A1 + odd_c + severity_c + priormed_c + 
    race_c, data = dat_smart, id = ID, std.err = "san.se")

 Coefficients:
            Estimate Std.err   Wald Pr(>|W|)    
(Intercept)   2.8894  0.1365 447.77   <2e-16 ***
A1            0.3758  0.1471   6.53    0.011 *  
odd_c        -0.6609  0.2903   5.18    0.023 *  
severity_c   -0.0695  0.0688   1.02    0.312    
priormed_c   -0.1827  0.3469   0.28    0.598    
race_c        0.5632  0.3926   2.06    0.151    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation structure = independence 
Estimated Scale Parameters:

            Estimate Std.err
(Intercept)     2.79   0.289
Number of clusters:   150  Maximum cluster size: 1 

\[ \begin{split} \widehat{E}\left[Y_2(A_1) | \mathbf{X}\right] &= \widehat{\beta_0} + \widehat{\beta_1} A_1 \\ &+ \widehat{\beta_2} X_{1c} + \widehat{\beta_3} X_{2c} + \widehat{\beta_4} X_{3c} + \widehat{\beta_5} X_{4c} \end{split} \]

Correspondence between output and equation

Quantity Value
\(\widehat{\beta_0}\) 2.8894
\(\widehat{\beta_1}\) 0.3758
\(\widehat{\beta_2}\) -0.6609
\(\widehat{\beta_3}\) -0.0695
\(\widehat{\beta_4}\) -0.1827
\(\widehat{\beta_5}\) 0.5632

Step 3. Estimate key quantities of interest

We typically have the mean of \(Y_2\) under each first-stage intervention option and the main effect of first-stage intervention options as our key quantities of interest.

# The step above creates three rows and six columns in L.
L <- rbind(
  # The 1st line can be thought of as a set of instructions
  #  to tell the computer to calculate b0+b1
  "Mean Y2 under A1=+1 (BMOD)"   = c(1,  1, 0, 0, 0, 0),
  # The 2nd line can be thought of as a set of instructions
  #  to tell the computer to calculate b0-b1
  "Mean Y2 under A1=-1 (MED)"    = c(1, -1, 0, 0, 0, 0),
  # The 3rd line can be thought of as a set of instructions 
  # to tell the computer to calculate 2*b1
  "Main effect using full sample" = c(0,  2,  0, 0, 0, 0))
# If one wishes to estimate more quantities, 
# one may simply add a new row to L
# before triggering the execution of the  
# estimation step (i.e., the next code snippet).
# This new row will have to have exactly six columns 
# since our regression model of interest has exactly 
# six parameters (including the intercept term).
print(L)
                              [,1] [,2] [,3] [,4] [,5] [,6]
Mean Y2 under A1=+1 (BMOD)       1    1    0    0    0    0
Mean Y2 under A1=-1 (MED)        1   -1    0    0    0    0
Main effect using full sample    0    2    0    0    0    0
# This line triggers the execution of the estimation of 
# the key quantities of interest.
# This step is similar to what SAS sometimes gives 
# for ESTIMATE or LSMEANS statements.
est_contrasts <- estimate(model, L)
print(est_contrasts)
                              Estimate 95% LCL 95% UCL    SE p-value    
Mean Y2 under A1=+1 (BMOD)      3.2653  2.8793  3.6512 0.197  <1e-04 ***
Mean Y2 under A1=-1 (MED)       2.5136  2.1128  2.9143 0.204  <1e-04 ***
Main effect using full sample   0.7517  0.1750  1.3284 0.294  0.0106 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Workflow for Primary Aim Type 1 is completed.

Colored shapes indicate the primary comparison of interest

This output says that:

  • The estimated mean of \(Y_2\) under \(A_1=+1\) (BMOD) is 3.2653
  • The estimated mean of \(Y_2\) under \(A_1=-1\) (MED) is 2.5136
  • The estimated main effect of first-stage intervention options is 0.7517

4 Analysis code for Primary Aim Type 2

4.1 Regression model

\[ E\left[Y_2(A_2) | \mathbf{X}, R = 0\right] = \beta_0 + \beta_1 A_{2} + \beta_2 X_{1cNR} + \beta_3 X_{2cNR} + \beta_4 X_{3cNR} + \beta_5 X_{4cNR} \]

4.2 Step-by-step workflow and R code syntax

# Create a copy of the original dataset
dat_smart <- dat_adhd

Colored shapes indicate the primary comparison of interest

Step 1. Contrast code stage-1 and stage-2 treatment indicators (done for you) and center each baseline covariate by its own mean among non-responders

dat_smart_nonresponders <- dat_smart %>%
  # In all subsequent steps, 
  # we only retain non-responders' observations
  filter(R == 0) %>%   
  # Center each baseline covariate by its own 
  # mean conditional on response
  mutate(odd_cNR = odd - mean(odd),                 # this creates X_1cNR
         severity_cNR = severity - mean(severity),  # this creates X_2cNR
         priormed_cNR = priormed - mean(priormed),  # this creates X_3cNR
         race_cNR = race - mean(race))              # this creates X_4cNR

Step 2. Estimate parameters in regression model

model <- geeglm(Y2 ~ A2 + odd_cNR + severity_cNR + priormed_cNR + race_cNR, 
                id = ID, 
                data = dat_smart_nonresponders, 
                # Remember to ask the computer to give 
                # you rubust standard errors
                std.err = "san.se") 
summary(model)

Call:
geeglm(formula = Y2 ~ A2 + odd_cNR + severity_cNR + priormed_cNR + 
    race_cNR, data = dat_smart_nonresponders, id = ID, std.err = "san.se")

 Coefficients:
             Estimate Std.err   Wald Pr(>|W|)    
(Intercept)    2.7151  0.1643 273.06   <2e-16 ***
A2            -0.4200  0.1692   6.16    0.013 *  
odd_cNR       -0.8406  0.3486   5.81    0.016 *  
severity_cNR   0.0454  0.0893   0.26    0.611    
priormed_cNR  -0.2806  0.3578   0.62    0.433    
race_cNR       0.4007  0.5242   0.58    0.445    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation structure = independence 
Estimated Scale Parameters:

            Estimate Std.err
(Intercept)     2.75   0.297
Number of clusters:   101  Maximum cluster size: 1 

\[ \begin{split} \widehat{E}\left[Y_2(A_2) | \mathbf{X}\right] &= \widehat{\beta_0} + \widehat{\beta_1} A_2 \\ &+ \widehat{\beta_2} X_{1cNR} + \widehat{\beta_3} X_{2cNR} + \widehat{\beta_4} X_{3cNR} + \widehat{\beta_5} X_{4cNR} \end{split} \]

Correspondence between output and equation

Quantity Value
\(\widehat{\beta_0}\) 2.7151
\(\widehat{\beta_1}\) -0.4200
\(\widehat{\beta_2}\) -0.8406
\(\widehat{\beta_3}\) 0.0454
\(\widehat{\beta_4}\) -0.2806
\(\widehat{\beta_5}\) 0.4007

Step 3. Estimate key quantities of interest

We typically have the mean of \(Y_2\) among non-responders under each second-stage intervention option and the main effect of second-stage intervention options among non-responders as our key quantities of interest.

L <- rbind(
  # The 1st line can be thought of as a set of instructions to tell the
  # computer to calculate b0+b1
  "Mean Y2 under A2=+1 (Intensify)"   = c(1,  1, 0, 0, 0, 0),
  # The 2nd line can be thought of as a set of instructions to tell the
  # computer to calculate b0-b1
  "Mean Y2 under A2=-1 (Augment)"    = c(1, -1, 0, 0, 0, 0),
  # The 3rd line can be thought of as a set of instructions to tell the
  # computer to calculate 2*b1
  "Main effect using non-responders" = c(0,  2,  0, 0, 0, 0))
print(L)
                                 [,1] [,2] [,3] [,4] [,5] [,6]
Mean Y2 under A2=+1 (Intensify)     1    1    0    0    0    0
Mean Y2 under A2=-1 (Augment)       1   -1    0    0    0    0
Main effect using non-responders    0    2    0    0    0    0
est_contrasts <- estimate(model, L)
print(est_contrasts)
                                 Estimate 95% LCL 95% UCL    SE p-value    
Mean Y2 under A2=+1 (Intensify)    2.2952  1.8181  2.7722 0.243  <1e-04 ***
Mean Y2 under A2=-1 (Augment)      3.1351  2.6881  3.5821 0.228  <1e-04 ***
Main effect using non-responders  -0.8399 -1.5032 -0.1767 0.338  0.0131 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Workflow for Primary Aim Type 2 is completed.

Colored shapes indicate the primary comparison of interest

This output says that:

  • The estimated mean of \(Y_2\) under \(A_2=+1\) (Intensify) is 2.2952
  • The estimated mean of \(Y_2\) under \(A_2=-1\) (Augment) is 3.1351
  • The estimated main effect of second-stage intervention options among non-responders to first-stage intervention options? -0.8399

5 Analysis code for Primary Aim Type 3

5.1 Regression model

\[ E\left[Y_2(A_1, A_2) | \mathbf{X}\right] = \beta_0 + \beta_1 A_{1} + \beta_2 A_2 + \beta_3 A_1 A_2 + \beta_4 X_{1c} + \beta_5 X_{2c} + \beta_6 X_{3c} + \beta_7 X_{4c} \]

5.2 Step-by-step workflow and R code syntax

# Create a copy of the original dataset
dat_smart <- dat_adhd

Colored shapes indicate the primary comparison of interest

Step 1. Contrast code stage-1 and stage-2 treatment indicators (done for you) and center each baseline covariate by its own grand mean

dat_smart <- dat_smart %>%
  # Center each baseline covariate by its own grand mean
  mutate(odd_c = odd - mean(odd),                 # this creates X_1c
         severity_c = severity - mean(severity),  # this creates X_2c
         priormed_c = priormed - mean(priormed),  # this creates X_3c
         race_c = race - mean(race))              # this creates X_4c

Step 2. Create weights

The actual numeric value of the weights (i.e., 2 and 4) can be obtained “by hand” by calculating the inverse (reciprocal) of the probability of being assigned to a particular adaptive intervention.

  • For responders, \(W = 1 \Bigg/\left(\frac{1}{p_1}\right) = 1 \Bigg/\left(\frac{1}{2}\right) = 2\)
  • For non-responders, \(W = 1 \Bigg/\left(\frac{1}{p_1}\right) \left(\frac{1}{p_2}\right) = 1 \Bigg/ \left(\frac{1}{2}\right) \left(\frac{1}{2}\right)\) = 4
# We need to create a new column that contains 
# the weight that we will assign to
# non-responders and responders in the dataset
dat_smart <- dat_smart %>%
  mutate(design_weights = if_else(R == 1, 
                                  2, # when R == 1 is TRUE
                                  4  # when R == 1 is FALSE
                                  ))     

Step 3. Create new dataset with replicated rows for responders

We restructure the original dataset such that instead of one observation per responder, the new dataset includes two copies of each responder’s observation. Non-responders’ observations will remain intact (preserved) in the replication step.

We first save non-responders’ observations into a dataset of their own.

# Save non-responders' observations 
# into the variable rows_not_to_replicate
dat_rows_not_to_replicate <- dat_smart %>% filter(R==0)

Restructure the dataset: STEP 1

Recall that A2 was coded as NA in the original dataset. The key in the replication step is to replace NA in A2 by +1 in the first copy and NA in A2 by -1 in the second copy.

We next save responders’ observations into a dataset of their own.

# Save responders' observations into the variable rows_to_replicate
dat_rows_to_replicate <- dat_smart %>% filter(R==1)

Restructure the dataset: STEP 2

# In the next two lines of code, we create two copies of rows_to_replicate.
# For the first copy (see next line) assign a value of +1 to A2.
dat_plus_one_pseudodata <- dat_rows_to_replicate %>% mutate(A2 = +1)
# For the second copy (see next line) assign a value of -1 to A2.
dat_minus_one_pseudodata <- dat_rows_to_replicate %>% mutate(A2 = -1)
# This is the new dataset where
# each responder has 2 rows and each
# and each non-responder has 1 row
dat_smart_replicated <- rbind(dat_rows_not_to_replicate,
                              dat_plus_one_pseudodata,
                              dat_minus_one_pseudodata) %>%
  arrange(ID) # crucial we arrange dataset by ID for geeglm to idnetify the "clusters"

Restructure the dataset: STEP 3

# Inspect a couple of rows to verify that each responder has
# two rows (which are exact copies of each other, 
# except for the value of A2!)
# and a weight of 2
dat_smart_replicated %>% 
  filter(R == 1) %>% 
  select(ID, odd_c, severity_c, priormed_c, race_c,
         A1, R, A2, design_weights, Y2, cell) %>%
  head(., n = 4)
ID odd_c severity_c priormed_c race_c A1 R A2 design_weights Y2 cell
2 -0.407 -0.638 -0.3 -0.847 1 1 1 2 4.27 D
2 -0.407 -0.638 -0.3 -0.847 1 1 -1 2 4.27 D
3 0.593 0.798 -0.3 0.153 -1 1 1 2 1.45 A
3 0.593 0.798 -0.3 0.153 -1 1 -1 2 1.45 A

Done restructuring

Sanity check: Responders

# Inspect a couple of rows to verify that each non-responder has
# one row and a weight of 4
dat_smart_replicated %>% 
  filter(R == 0) %>% 
  select(ID, odd_c, severity_c, priormed_c, race_c,
         A1, R, A2, design_weights, Y2, cell) %>%
  head(., n = 4)
ID odd_c severity_c priormed_c race_c A1 R A2 design_weights Y2 cell
1 0.593 -1.891 -0.3 0.153 -1 0 1 4 0.598 C
4 -0.407 0.161 -0.3 0.153 1 0 -1 4 6.762 E
5 0.593 0.732 -0.3 0.153 1 0 -1 4 3.580 E
6 -0.407 0.727 -0.3 0.153 1 0 1 4 2.075 F

Sanity check: Non-responders

Center covariates before weighting and replicating and not after weighting and replicating. In other words, if Step 1 was performed after both Steps 2 and 3, then estimates of the treatment effect will not necessarily be correct!

Step 4. Estimate parameters in regression model

Reminder: regression model \[ \begin{split} E\left[Y_2(A_1, A_2) | \mathbf{X}\right] &= \beta_0 + \beta_1 A_{1} + \beta_2 A_2 + \beta_3 A_1 A_2 \\ &+ \beta_4 X_{1c} + \beta_5 X_{2c} + \beta_6 X_{3c} + \beta_7 X_{4c} \end{split} \]

dat_smart_replicated <- dat_smart_replicated %>% arrange(ID)

model <- geeglm(Y2 ~ A1 + A2 + I(A1*A2) 
                        + odd_c + severity_c + priormed_c + race_c,
                # specify which column contains the participant ID's
                id = ID,  
                # remember to use the weighted and replicated dataset
                data = dat_smart_replicated,  
                # remember to weight each row appropriately
                weights = design_weights,  
                # ask the computer to treat replicates as independent units
                corstr = "independence",  
                # ask the computer to give you robust standard errors
                std.err = "san.se") 
# Inspect parameter estimates
summary(model)

Call:
geeglm(formula = Y2 ~ A1 + A2 + I(A1 * A2) + odd_c + severity_c + 
    priormed_c + race_c, data = dat_smart_replicated, weights = design_weights, 
    id = ID, corstr = "independence", std.err = "san.se")

 Coefficients:
            Estimate Std.err   Wald Pr(>|W|)    
(Intercept)   2.9142  0.1326 483.31   <2e-16 ***
A1            0.4209  0.1415   8.85   0.0029 ** 
A2           -0.3473  0.1110   9.79   0.0018 ** 
I(A1 * A2)   -0.1070  0.1111   0.93   0.3352    
odd_c        -0.6989  0.2871   5.92   0.0149 *  
severity_c   -0.0694  0.0662   1.10   0.2950    
priormed_c   -0.1278  0.3400   0.14   0.7069    
race_c        0.5673  0.3739   2.30   0.1292    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation structure = independence 
Estimated Scale Parameters:

            Estimate Std.err
(Intercept)     2.66   0.262
Number of clusters:   150  Maximum cluster size: 2 

\[ \begin{split} \widehat{E}\left[Y_2(A_1, A_2) | \mathbf{X}\right] &= \widehat{\beta_0} + \widehat{\beta_1} A_{1} + \widehat{\beta_2} A_2 + \widehat{\beta_3} A_1 A_2 \\ &+ \widehat{\beta_4} X_{1c} + \widehat{\beta_5} X_{2c} + \widehat{\beta_6} X_{3c} + \widehat{\beta_7} X_{4c} \end{split} \]

Correspondence between output and equation

Quantity Value
\(\widehat{\beta_0}\) 2.9142
\(\widehat{\beta_1}\) 0.4209
\(\widehat{\beta_2}\) -0.3473
\(\widehat{\beta_3}\) -0.1070
\(\widehat{\beta_4}\) -0.6989
\(\widehat{\beta_5}\) -0.694
\(\widehat{\beta_6}\) -0.1278
\(\widehat{\beta_7}\) 0.5673

Step 5. Obtain estimated means under each embedded AI

Reminder: regression model \[ \begin{split} E\left[Y_2(A_1, A_2) | \mathbf{X}\right] &= \beta_0 + \beta_1 A_{1} + \beta_2 A_2 + \beta_3 A_1 A_2 \\ &+ \beta_4 X_{1c} + \beta_5 X_{2c} + \beta_6 X_{3c} + \beta_7 X_{4c} \end{split} \]

Reminder: how variables were coded

  • \(A_1 = 1\) (BMOD), \(A_1 = -1\) (MED)
  • \(A_2 = 1\) (INTENSIFY), \(A_2 = -1\) (AUGMENT)

Contrast coding logic: Mean under (MED, AUGMENT) \[ \begin{split} E\left[Y_2({\color{seagreen}{-1}}, {\color{seagreen}{-1}}) | \mathbf{X}\right] &= \beta_0 {\color{royalblue}{(1)}} + \beta_1 {\color{royalblue}{(-1)}} + \beta_2 {\color{royalblue}{(-1)}} + \beta_3 {\color{royalblue}{(1)}} \\ &+ \beta_4 {\color{royalblue}{(0)}} + \beta_5 {\color{royalblue}{(0)}} + \beta_6 {\color{royalblue}{(0)}} + \beta_7 {\color{royalblue}{(0)}} \end{split} \]

Contrast coding logic: Mean under (BMOD, AUGMENT) \[ \begin{split} E\left[Y_2({\color{seagreen}{1}}, {\color{seagreen}{-1}}) | \mathbf{X}\right] &= \beta_0 {\color{royalblue}{(1)}} + \beta_1 {\color{royalblue}{(1)}} + \beta_2 {\color{royalblue}{(-1)}} + \beta_3 {\color{royalblue}{(-1)}} \\ &+ \beta_4 {\color{royalblue}{(0)}} + \beta_5 {\color{royalblue}{(0)}} + \beta_6 {\color{royalblue}{(0)}} + \beta_7 {\color{royalblue}{(0)}} \end{split} \]

Correspondence between output and equation

Quantity Value
\[ \widehat{E}\left[Y_2({\color{seagreen}{-1}}, {\color{seagreen}{-1}}) | \mathbf{X}\right] \] 2.734
\[ \widehat{E}\left[Y_2({\color{seagreen}{1}}, {\color{seagreen}{-1}}) | \mathbf{X}\right] \] 3.789
\[ \widehat{E}\left[Y_2({\color{seagreen}{-1}}, {\color{seagreen}{1}}) | \mathbf{X}\right] \] 2.253
\[ \widehat{E}\left[Y_2({\color{seagreen}{1}}, {\color{seagreen}{1}}) | \mathbf{X}\right] \] 2.881
# L is user-specified with
# -- number of rows = number of AI's
# -- number of columns = number of parameters in regression model
L <- rbind(
  # These statements get the contrast corresponding to
  # the mean end-of-study outcome under each embedded AI
  "AI#1 (MED, AUGMENT)"    = c(1, -1, -1,  1, rep(0,4)),
  "AI#2 (BMOD, AUGMENT)"   = c(1,  1, -1, -1, rep(0,4)),
  "AI#3 (MED, INTENSIFY)"  = c(1, -1,  1, -1, rep(0,4)),
  "AI#4 (BMOD, INTENSIFY)" = c(1,  1,  1,  1, rep(0,4)))
# The function `estimate` has two user-specified arguments:
# -- fit: an output of geeglm, the function we used to 
#         estimate the parameters of our regression model
#         of interest
# -- combos: contrasts of interest, encoded into a matrix
est_contrasts <- estimate(fit = model, combos = L)

print(est_contrasts)
                       Estimate 95% LCL 95% UCL    SE p-value    
AI#1 (MED, AUGMENT)       2.734   2.303   3.164 0.220  <1e-04 ***
AI#2 (BMOD, AUGMENT)      3.789   3.348   4.231 0.225  <1e-04 ***
AI#3 (MED, INTENSIFY)     2.253   1.694   2.812 0.285  <1e-04 ***
AI#4 (BMOD, INTENSIFY)    2.881   2.367   3.394 0.262  <1e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Step 6: Obtain estimated effect

Correspondence between output and equation

Quantity Value
\[ \begin{split}&\widehat{E}\left[Y_2({\color{seagreen}{1}}, {\color{seagreen}{-1}}) | \mathbf{X}\right] - \widehat{E}\left[Y_2({\color{seagreen}{-1}}, {\color{seagreen}{-1}}) | \mathbf{X}\right] \\&= \widehat{\beta_1} {\color{royalblue}{(2)}} + \widehat{\beta_3} {\color{royalblue}{(-2)}} \\\end{split} \] 1.056

Let’s suppose that the primary pairwise comparison of interest is the causal effect of (BMOD, AUGMENT) versus (MED, AUGMENT).

D <- rbind(
  # This statement obtains the contrast corresponding to
  # the difference in end-of-study mean between 
  # (BMOD, AUGMENT) and (MED, AUGMENT)
  "(BMOD, AUGMENT) vs. (MED, AUGMENT)" = c(0, 2, 0, -2, rep(0,4))
  )
est_more_contrasts <- estimate(fit = model, combos = D)

print(est_more_contrasts)
                                   Estimate 95% LCL 95% UCL    SE p-value    
(BMOD, AUGMENT) vs. (MED, AUGMENT)   1.0558  0.4406  1.6711 0.314 0.00077 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

5.3 Visualize 95% CI’s

Visual check: does the 95% CI cross zero (horizontal dashed blue line)?

Workflow for Primary Aim Type 3 is completed.

Colored shapes indicate the primary comparison of interest

6 Knowledge Check

If we included response status (R) in our regression model for Typical Primary Aim 3 as in the model below, may we still interpret \(2 \beta_1 - 2 \beta_3\) as the causal effect of (BMOD, AUGMENT) versus (MED, AUGMENT)?

\[ E\left[Y_2(A_1, A_2) | \mathbf{X}, R\right] = \beta_0 + \beta_1 A_{1} + \beta_2 A_2 + \beta_3 A_1 A_2 + \beta_4 X_{1c} + \beta_5 X_{2c} + \beta_6 X_{3c} + \beta_7 R \]